Corpus-Oriented Development of Japanese HPSG Parsers
نویسنده
چکیده
This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules, and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical parser for the grammar on the treebank, and evaluated the parser in terms of the accuracy of semantic-role identification and dependency analysis.
منابع مشابه
An Agent-based Parallel HPSG Parser for Shared-memory Parallel Machines
We describe an agent-based parallel HPSG parser that operates on shared-memory parallel machines. It efficiently parses real-world corpora by using a wide-coverage HPSG grammar. The efficiency is due to the use of a parallel parsing algorithm and the efficient treatment of feature structures. The parsing algorithm is based on the CKY algorithm, in which resolving constraints between a mother an...
متن کاملAmbiguous Part-of-Speech Tagging for Improving Accuracy and Domain Portability of Syntactic Parsers
We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several advantages, such as the capability of domain adaptation. However the performance of such systems on raw texts tends to be disappointing as they are affected by the errors of automatic POS tagging. We att...
متن کاملEfficient HPSG Parsing Algorithm with Array Unification
This paper presents a method for improving parsing performance of parsers for HPSG. The method was obtained by extending Torisawa’s parsing method for HPSG. His parsing method utilizes a CFG compiled from a given HPSG-based grammar, and the parser predicts the possible parse trees with the CFG. Since the amount of unification is reduced because of this prediction, parsing performance is improve...
متن کاملConstituency Parsing of Bulgarian: Word- vs Class-based Parsing
In this paper, we report the obtained results of two constituency parsers trained with BulTreeBank, an HPSG-based treebank for Bulgarian. To reduce the data sparsity problem, we propose using the Brown word clustering to do an off-line clustering and map the words in the treebank to create a class-based treebank. e observations show that when the classes outnumber the POS tags, the results are...
متن کاملHPSG-based annotation scheme for corpora development and parsing evaluation
This paper proposes a formal framework for development and exploitation of a corpus, based on the HPSG linguistic theory. The formal representation of the annotation scheme facilitates the annotation process and ensures the quality of the corpus and its usage in different application scenarios. Also, evaluation over HPSG annotation scheme is discussed. The advantages of the approach are present...
متن کامل